This notebook presents how shapefiles and datasets with longitude and latitude data can be manipulated and visualized.
Components
Considerations and Assumptions
Definition of Terms
Challenges
Objectives
Transform shapefiles and dataset for analysis
Spatially join information from region shapefile to each data point
Render choropleth map only basing on shapefile
Data Sources
import pandas as pd
import json
import geopandas as gpd
import plotly.graph_objects as go
Import dataset
dataset_df = pd.read_json('data/sample_set.json')
dataset_df
(Exploratory Data Analysis)
From the table given, we can see that it has geographical coordinates: attributes.location_longitude, attributes_location_latitude
Dataset needs to be converted to GeoDataFrame to convert coordinate columns to geometrically readable coordinates
Reference
Remark
dataset_gdf = gpd.GeoDataFrame(dataset_df, geometry=gpd.points_from_xy(dataset_df['attributes.location_longitude'], dataset_df['attributes.location_latitude']), crs='EPSG:4326')
dataset_gdf
dataset_gdf.plot()
(Exploratory Data Analysis)
From the table given, added column for geometry
Shapefile needs to be converted to GeoDataFrame for further data processing
regions_gdf = gpd.read_file('data/ph_regions.shp')
regions_gdf
regions_gdf.crs
regions_gdf.plot()
(Exploratory Data Analysis)
From the EDA above, CRS is EPSG:4326 which validates CRS initialized in Step 1
Shapefile needs to be converted to json for visualization
regions_json = json.loads(regions_gdf.to_json())
regions_json results
{'type': 'FeatureCollection',
'features': [{'id': '0',
'type': 'Feature',
'properties': {'REGION': 'Autonomous Region of Muslim Mindanao (ARMM)'},
'geometry': {'type': 'MultiPolygon',
'coordinates': [[[[119.46694183349618, 4.586939811706523],
(Exploratory Data Analysis)
From json above, it can be seen that under features key, following key-pair value exists: _id, properties, geometry
Remark This is important note when setting up the choropleth map
This maps region polygon / shapefile information to datapoints inside respective polygon
dataset_x_region = gpd.sjoin(dataset_gdf, regions_gdf, op='within')
dataset_x_region
Geopandas have functionalities of pandas DataFrame. For choropleth mapping, aggregation is needed to generated the heatmap
aggregated_data = dataset_x_region[['REGION', 'values']].groupby('REGION').mean().reset_index()
aggregated_data
From Obj 1, Step 4, EDA remark, shown in the geojson data that 'REGION' data is nested under properties.
Thus in featureidkey, string should be in format "properties.idkey"
There is common mistake to ignore the prefix "properties" because this is not seen when visualizing the shapefile GeoDataFrame table
Always remember that shapefile needs to be converted to geojson for plotly and mapbox to read the polygons. Which is why prefix is needed since featureidkey is read from geojson and not from GeoDataFrame
GeoDataFrame is just preparatory step to convert shapefile to geojson
token = open(".mapbox_token").read().strip()
fig = go.Figure(
go.Choroplethmapbox(
geojson=regions_json,
featureidkey='properties.REGION',
locations=aggregated_data['REGION'],
z=aggregated_data['values'],
colorscale="Viridis"
)
)
fig.update_layout(mapbox_style="light", mapbox=dict(accesstoken=token))
# Refer to Snapshots